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'  This  paper  focuses  on  algorithms  for  matching  problems  that  run  on  an  EREW  PRAM  with  p 
processors.  Given  is  a  bipartite  graph  with  n  vertices,  m  edges,  and  integral  edge  costs  at  most  N  in 
magnitude. ^n  algorithm  is  presented  for  the  assignment  problem  (minimum  cost  perfect  bipartite 
z  matching)  that  runs  in  G(-v/nmlog(n.Ar)(logp)/p)  time  and  0(m)  space,  for  p  <  m/(y/nlog2n). 
>»This  bound  is  within  a  factor  of  log  p  of  optimum  speed-up  of  the  best  known  sequential  algorithm, 
which  in  turn  is  within  a  factor  of  log  (nN)  of  the  best  known  bound  for  the  problem  without  costs 
(maximum  cardinality  matching).  Extensions  of  the  algorithm  axe  given,  including  an  algorithm  for 
maximum  cardinality  bipartite  matching  with  slightly  better  processor  bounds,  and  similar  results 
for  bipartite  degree-constrained  subgraph  problems  (with  and  without  costs). 


*  This  is  an  expanded  version  of  a  paper  appearing  in  the  Proceedings  of  the  20tk  Annual  ACM 
Symposium  on  Theory  of  Computing ,  1988.  * 
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1.  Introduction. 


Problems  such  as  cardinality  matching,  degree-constrained  subgraphs  and  network  flow  have 
efficient  sequential  algorithms  [L,T]  but  seem  difficult  to  parallelize,  in  the  sense  of  NC  parallelism 
(e.g.,  [GSS]).  This  paper  investigates  parallel  algorithms  from  the  more  practical  viewpoint  of 
speeding  up  the  best  known  sequential  time  bounds.  It  achieves  running  times  that  are  within  a 
logarithmic  factor  of  optimum  speed-up,  using  significant  numbers  of  processors. 

To  state  the  results  we  first  give  some  definitions  and  notation;  more  definitions  are  at  the  end 
of  this  section.  A  matching  on  a  graph  is  a  set  of  vertex-disjoint  edges.  A  free  vertex  is  not  on 
any  matched  edge.  A  maximum  cardinality  matching  has  the  greatest  number  of  edges  possible;  a 
perfect  matching  has  no  free  vertices.  When  every  edge  has  a  numerical  cost,  the  cost  of  a  set  of 
edges  is  the  sum  of  its  edge  costs.  A  minimum  perfect  matching  is  a  perfect  matching  of  smallest 
possible  cost.  The  assignment  problem  is  to  find  a  minimum  perfect  matching  on  a  bipartite  graph. 
This  problem  has  many  practical  applications  [Danj. 

A  PRAM  ( Parallel  Random  Access  Machine)  consists  of  p  synchronized  processors  accessing  a 
common  memory.  On  an  ER.EW  ( Exclusive  Read  Exclusive  Write)  PRAM,  at  most  one  processor 
can  access  a  given  memory  cell  in  any  instruction  cycle.  If  a  sequential  algorithm  runs  in  time  t, 
an  optimum  speed-up  of  this  algorithm  on  a  PRAM  with  p  processors  runs  in  time  0{t/p). 

We  state  resource  bounds  in  terms  of  the  following  parameters;  The  given  graph  has  n  vertices, 
m  edges,  and  integral  edge  costs  at  most  N  in  magnitude.  The  model  of  computation  is  an  EREW 
PRAM  with  p  processors. 

Let  us  first  review  the  best  known  sequential  algorithms  for  bipartite  matching  problems.  For 
maximum  cardinality  matching  the  algorithm  of  HopcToft  and  Karp  runs  in  time  O(-Jnm)  [HK]. 
For  the  assignment  problem  the  best  known  strongly  polynomial  time  bound  is  0(n(m  +  nlogn)), 
achieved  by  the  Hungarian  algorithm  implemented  with  Fibonacci  heaps  [FT].  When  the  costs  are 
integers  of  magnitude  at  most  N  and  N  is  not  huge,  this  can  be  improved:  [GaT87]  gives  a  cost 
scaling  algorithm  that  runs  in  time  0{y/nm\og{nN)).  This  bound  is  within  a  logarithmic  factor 
of  Hopcroft  and  Karp’s  bound  for  the  simpler  problem  of  cardinality  matching. 

Our  main  result  extends  this  bound  to  parallel  computation: 

Theorem  1.1.  For  integral  costs  of  magnitude  at  most  N ,  the  assignment  problem  can  be  solved 
in  time  0{\/nm]og{nN)(\ogp)lp)  and  space  0(m),  for  p  <  m/(y/n\og2n). 

For  p  =  1  this  bound  equals  that  of  [GaT87],  although  the  latter  algorithm  is  simpler.  For 


1  <  p  <  m/(v/n  log  5n)  our  algorithm  speeds  up  the  sequential  algorithm  by  the  almost  optimum 
factor  of  p/logp.  This  gives  a  non-trivial  speed-up  even  for  very  sparse  graphs,  e.g.,  graphs  with 
m  =  O(n).  Using  the  greatest  number  of  processors  allowed,  Theorem  1.1  gives  a  running  time  of 
0(n  log  3n  log  (nlV)). 

Now  let  us  review  previous  parallel  algorithms  for  matching  and  related  problems.  [KUW]. 
[GP]  and  [MW]  give  parallel  algorithms  for  minimum  cost  matching  that  work  even  on  general 
graphs.  These  algorithms  are  probabilistic,  however,  and  use  large  numbers  of  processors.  (The 
time  is  0(  log2n)  with  nmNM(n)  processors  for  [MW];  slightly  fewer  processors  and  slightly  more 
time  for  [KUW]  and  [GP].  Here  Af(n)  >  n2  is  the  sequential  time  needed  to  multiply  two  n  x  n 
matrices.  The  space  for  these  algorithms  is  also  large  since  it  equals  the  processor  count).  Our 
work  is  related  to  the  algorithm  of  Goldberg  and  Tarjan  for  the  more  general  minimum  cost  flow 
problem  [GoT].  A  parallel  version  of  their  algorithm  runs  in  0(n2(  log  n)(  log  nN))  time  using  n 
processors  and  0(n2 )  space. 

Our  assignment  algorithm  can  be  used  to  solve  the  single-source  shortest  path  problem  on 
a  directed  graph  with  arbitrary  edge  lengths,  in  the  bounds  of  Theorem  1.1.  The  assignment 
algorithm  generalizes  to  the  minimum  cost  degree-constrained  subgraph  problem.  The  result  is 
similar  to  Theorem  1.1:  all  n’s  in  the  bounds  of  Theorem  1.1  change  to  U,  the  sum  of  the  degree 
constraints.  These  results  achieve  close  to  optimum  speed-up  of  sequential  algorithms  in  [GaT8~j 
for  the  same  problems. 

The  basic  problem  addressed  by  our  work  is  finding  paths  fast  in  parallel.  Efficient  transitive 
closure  algorithms  use  too  many  processors  for  optimum  speed-up.  Our  solution  involves  extending 
the  notion  of  6-optimality  of  [GoT]  (and  the  equivalent  notion  of  1-optimality  of  [GaT87])  to  r- 
optimality,  a  form  more  suited  to  parallel  computation.  We  also  we  use  the  idea  of  a  reduced  graph 
and  a  path  doubling  technique  to  ensure  that  all  paths  explored  are  short  on  the  average. 

Our  approach  to  the  assignment  problem  simplifies  when  applied  to  the  problem  of  finding  a 
maximum  cardinality  matching  on  a  bipartite  graph,  giving  slightly  better  processor  bounds: 

Theorem  1.2.  A  maximum  cardinality  matching  on  a  bipartite  graph  can  be  found  in  time 
0((y/nm  log p)/p)  and  space  0(m),  for  p  <  m/(y/n logn). 

This  bound  is  within  a  factor  of  log  p  of  an  optimum  speed-up  of  the  best  sequential  algorithm 
[HK].  Using  the  greatest  number  of  processors  allowed,  the  running  time  is  0(n  log 2  n). 

Shiloach  and  Vishkin  [SV]  give  a  parallel  algorithm  for  cardinality  matching.  They  achieve 
almost  optimum  speed-up  but  for  fewer  processors,  p  <  m/n  (fastest  running  time  0(nis  Iogn)). 
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The  algorithm  of  [KC]  has  a  faster  ruiming  time  of  0(n  log  n  log  log  n ),  a  factor  of  log  n /  log  log  n 
better  than  our  fastest  time.  But  transitive  closure  is  used,  whence  the  number  of  processors  (and 
the  space)  is  M (n).  The  above-mentioned  algorithms  of  [KUW],  [GP]  and  [MW]  are  more  efficient 
for  cardinality  matching  but  still  not  close  to  optimum  in  their  use  of  processors  (p  =  nM(n)  for 
[KUW,  GP],  p  =  nmM(n)  for  [MW]). 

Our  cardinality  matching  algorithm  generalizes  to  the  maximum  cardinality  degree-constrained 
subgraph  problem.  For  instance  on  a  bipartite  graph  the  time  is  0((n2^3mlogp)/p)  for  p  <  m/ns /6. 
This  improves  Shiloach  and  Vishkin’s  bound  of  O((nmlog  n)/p)  time  for  p  <  m/n  [SV], 

The  rest  of  this  paper  is  organized  as  follows.  Section  2  presents  our  algorithm  for  maximum 
cardinality  bipartite  matching,  and  along  with  some  extensions.  This  illustrates  our  approach  in  a 
simple  setting.  Section  3  is  devoted  to  the  assignment  problem.  Section  4  discusses  the  extensions 
to  the  shortest  path  problem  and  the  minimum  cost  degree-constrained  subgraph  problem.  This 
section  closes  with  definitions  from  graph  theory. 

The  logarithm  function  log  n  is  to  the  base  two;  log'r.  denotes  the  itfl  power  of  log  n.  It 
is  convenient  to  take  log  1  to  be  one.  We  use  the  following  convention  to  sum  the  values  of 
a  function:  If  /  is  a  real-valued  function  defined  on  elements  and  5  is  a  set  of  elements,  then 

f(S)  =  £{/(*)!*  €  5}. 

The  given  graph  has  vertex  set  V  and  edge  set  E.  In  general  for  a  graph  G ,  V{G)  and  E{G) 
denote  its  vertex  set  and  edge  set,  respectively.  We  use  the  notation  v  £  G  (vw  €  (?)  as  a  shorthand 
for  v  6  V(G)  (vw  €  £(<?))  when  no  confusion  can  arise.  If  H  is  a  subgraph,  an  E-edge  is  an  edge 
in  H  and  a  non-ff-edge  is  not  in  H.  If  the  given  graph  is  bipartite  we  denote  the  bipartition  as  Vo, 
Vi.  Hence  any  edge  joins  Vo  to  Vj;  if  e  is  an  edge,  eo  denotes  its  vertex  in  V0  and  similarly  for  ex. 
For  a  subgraph  ff ,  Vq(E)  and  VX(H)  have  the  obvious  meaning.  In  problems  with  edge  costs,  c(e) 
denotes  the  cost  of  edge  e.  By  our  convention  for  functional  notation,  c(S)  denotes  the  total  cost 
of  a  set  of  edges  5. 

An  st-path  is  a  path  from  vertex  s  to  vertex  t;  two  at-paths  are  vertex  disjoint  if  their  only 
common  vertices  are  s  and  t.  A  directed  graph  is  layered  if  V(G)  can  be  partitioned  into  sets 
Wj,  »  =  0,  such  that  any  edge  goes  from  some  to  W,+i. 

In  a  graph  with  a  matching  M ,  an  alternating  path  (cycle)  is  a  simple  path  (cycle)  whose  edges 
are  alternately  matched  and  unmatched.  An  augmenting  path  P  is  an  alternating  path  joining  two 
distinct  free  vertices.  Augmenting  along  P  means  enlarging  the  matching  to  M  ®  P,  a  matching 
with  one  more  edge. 

If  M  is  a  matching  on  a  bipartite  graph  <?,  the  residual  graph  for  M  (a  term  from  network  flow 
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theory  [T])  is  a  directed  graph  D  that  models  the  augmenting  paths.  The  vertices  of  D  are  those 
in  (G)  plus  two  new  vertices  s  and  t;  the  edges  of  D  are  the  unmatched  edges  of  G ,  directed  from 
Vo  to  Vi;  plus  the  matched  edges  of  G,  directed  oppositely;  plus  edges  from  s  to  each  free  vertex 
of  Vo;  plus  edges  from  each  free  vertex  of  Vi  to  t.  Augmenting  paths  for  M  correspond  one-to-one 
with  .st-paths  in  D. 

Consider  a  multigraph  in  which  each  vertex  v  has  associated  nonnegative  integers  l{v)  and 
tx(t>).  Let  I?  be  a  subgraph,  and  let  d(v)  denote  the  degree  of  vertex  v  in  D  (each  copy  of  an  edge 
incident  to  t;  contributes  one  to  d(v)).  A  degree-constrained  subgraph  ( DCS)  is  a  subgraph  in  which 
each  v  has  f(v)  <  d(v)  <  u(v).  In  a  perfect  DCS  each  v  has  d(v)  =  u(v).  The  number  of  edges  in 
a  perfect  DCS  is  denoted  by  U  =  u(Vo).  A  vertex  is  free  in  D  if  d(v)  <  u(v).  Other  definitions  for 
DCS —  e.g.,  minimum  perfect  DCS,  the  residual  graph,  etc.,  follow  by  analogy  with  matching. 


2.  Maximum  cardinality  matching. 

This  section  introduces  our  approach  in  the  simple  setting  of  cardinality  problems.  It  discusses 
the  problem  of  maximum  cardinality  bipartite  matching,  proving  Theorem  1.2.  This  illustrates  two 
main  ingredients  of  our  algorithms:  efficient  breadth-first  search  techniques  and  the  reduced  graph. 
The  section  begins  by  stating  the  cardinality  matching  algorithm  of  Hopcroft  and  Karp  [HK].  Then 
it  gives  an  efficient  parallel  implementation.  The  section  ends  with  extensions  of  our  algorithm  to 
the  maximum  cardinality  degree-constrained  subgraph  problem. 

Let  G  be  a  bipartite  graph.  Consider  an  arbitrary  matching  M  on  G ,  with  residual  graph  D 
(defined  in  Section  1).  The  level  /(v)  of  a  vertex  v  is  the  length  of  a  shortest  sv-path  in  D  (f(v)  is 
infinite  if  there  is  no  su-path).  The  level  graph  ( for  M)  consists  of  all  directed  edges  xnv  that  have 
w )  =  t(v )  +  1  (with  both  levels  finite).  The  following  algorithm  of  Hopcroft  and  Karp  finds  a 
maximum  cardinality  matching  on  G 

procedure  match. 

Initialize  the  matching  M  to  0.  Then  repeat  the  following  steps  until  the  Search  Step  halts  with 
the  desired  matching. 

Search  Step.  Construct  the  level  graph  for  M  by  doing  a  breadth-first  search  on  the  residual 
graph  D,  starting  from  vertex  s.  If  l(t)  is  infinite  in  D ,  halt  with  the  desired  matching  M . 
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Augment  Step.  Find  a  maximal  set  A  of  vertex-disjoint  st-paths  in  the  level  graph.  Then  for 
each  path  P  €  .4,  augment  the  matching  along  P.  I 

Two  quantities  are  used  to  analyze  the  various  matching  algorithms  in  this  paper: 

I  =  the  number  of  iterations  of  the  loop  of  match; 

A  =  the  total  length  of  all  augmenting  paths  found  by  match. 

For  the  Hopcroft-Karp  algorithm  /  =  0(y/n)  [HK]  and  A  =  O(nlogn)  [ET].  Now  we  show  that, 
on  an  EREW  PRAM  with  p  processors  (1  <  p  <  m),  match  can  be  implemented  in  time 

0((lm/p+  A)  log  p).  (2.1) 

We  start  with  the  data  structure  for  graphs  that  is  used  throughout  this  paper.  The  given 
graph  G  is  represented  by  sequentially-stored  adjacency  lists.  More  precisely,  let  I(v)  denote  the 
set  of  edges  incident  to  a  vertex  v.  For  each  vertex  v ,  the  edges  of  I(v)  are  stored  in  consecutive 
locations;  v  has  pointers  to  the  first  and  last  edges  of  /(u),  and  all  lists  I(v)  are  stored  in  O(m) 
consecutive  locations.  The  two  copies  of  any  edge  are  linked  to  each  other.  Both  copies  of  an 
edge  have  0(1)  fields  for  working  storage  (determined  by  the  algorithm).  In  particular  these  fields 
contain  the  links  for  all  lists  of  edges  used  by  the  algorithm.  This  data  structure  can  be  constructed 
by  one  processor  in  time  0(m  +  n),  given  any  reasonable  input  representation  of  G ;  this  suffices 
for  our  purposes. 

This  data  structure  facilitates  scanning  a  set  of  edges  in  parallel.  For  example,  the  next  p  edges 
incident  to  a  vertex  v  can  be  scanned  concurrently,  processor  :  scanning  the  edge  i  locations  after 
the  current  position  in  I(v).  An  algorithm  for  breadth-first  search  can  be  based  on  this  principle. 
It  finds  the  first  L  levels  of  a  breadth-first  search  starting  from  any  given  set  of  vertices  in  time 

0((m/p  + I)  log  p).  (2.2) 

This  fact  may  be  viewed  as  an  application  of  Brent’s  principle  [B],  since  it  is  obvious  that  each  level 
of  a  breadth-first  search  can  be  done  in  0(1)  time,  assuming  m  processors  and  ignoring  processor 
allocation  problems.  We  will  assume  this  breadth-first  search  routine  for  now.  We  sketch  an 
implementation  below. 

The  Search  Step  is  implemented  with  the  breadth-first  search  routine.  Each  search  stops  when 
vertex  t  is  reached,  or  when  there  are  no  more  vertices  to  scan.  (The  latter  is  true  in  the  last 
iteration,  since  t(t)  is  infinite.) 
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The  Search  Steps  use  total  time  (2.1).  To  see  this  note  that  there  are  I  Search  Steps,  so  the 
first  terms  of  (2.2)  for  all  breadth-first  searches  sum  to  0((/m/p)  log  p).  In  all  searches  except  the 
last,  the  total  number  of  levels  L  searched  is  less  than  A,  by  the  stopping  criterion  of  the  Search 
Step.  The  last  search  explores  0(A)  levels,  since  the  search  paths  are  alternating.  So  the  secondv 
terms  of  (2.2)  sum  to  O(Alogp).  This  gives  the  desired  time  bound. 

The  Augment  Step  is  implemented  to  avoid  backtracking,  using  the  following  idea.  Consider  a 
directed  acyclic  graph  D  with  distinguished  vertices  s  and  t.  The  reduction  R  of  D  is  the  maximal 
subgraph  of  D  such  that  every  vertex  except  t  has  positive  outdegree  (in  R).  Clearly  any  st-path 
of  D  is  in  R.  Further,  an  st-path  in  R  can  be  found  by  a  greedy  strategy:  start  at  s  and  repeatedly 
traverse  an  edge  xy  directed  from  the  most  recently  reached  vertex  x.  The  Augment  Step  uses  the 
reduction  graph,  as  follows. 

For  a  subgraph  B ,  the  notation  V(B)  stands  for  V(H)  -  { s,t }. 

The  Augment  Step  finds  st-paths  one-by-one,  adding  successive  paths  to  A.  Let  L  be  the  level 
graph.  The  Augment  Step  maintains  the  reduction  R  of  the  graph  L  —  V,  (A).  (It  does  this  by 
marking  the  vertices  that  are  in  R.)  It  finds  the  next  st-path  P  using  the  above  greedy  Strategy- 
Starting  with  the  above  most  recently  reached  vertex  i,  to  find  the  next  edge  xy  the  processors 
repeatedly  examine  the  next  p  edges  directed  from  x,  to  find  an  edge  directed  to  a  vertex  y  of  R. 
Using  this  approach  the  time  to  find  all  st-paths  in  the  entire  match  algorithm  is  given  by  (2.1) 
(the  term  0((lm/p)  log  p)  accounts  for  the  time  examining  a  group  of  p  edges  that  do  not  lead  to 
a  vertex  in  R). 

After  an  st-path  P  is  found,  the  vertices  V  (P)  are  deleted  from  R.  Then  R  is  updated  so  that 
each  vertex  has  positive  outdegree.  This  is  essentially  a  breadth-first  search  backwards  from  the 
level  of  t  to  the  level  of  s.  The  search  uses  Cole’s  algorithm  to  sort  p  numbers  in  time  0(  logp)  (to 
keep  track  of  outdegrees  when  edges  are  deleted)  [C].  A  breadth-first  search  that  deletes  p  edges 
uses  time  0((p/p  +  X)logp).  Since  the  preceding  st-path  has  length  L ,  the  time  for  updating  R 
over  the  entire  algorithm  is  given  by  (2.1). 

We  finish  the  discussion  of  Theorem  1.2  by  sketching  the  parallel  breadth-first  search  routine. 
(This  routine  and  its  data  structures  are  used  throughout  the  paper.)  First  observe  that  it  is  easy 
to  do  a  breadth-first  search  of  G,  from  a  given  set  of  vertices  5,  in  time  0(n  logp+  m/p).  The  idea 
is  that  a  parallel  prefix  computation  broadcasts  the  next  vertex  v  to  scan;  the  processors  scan  the 
edges  incident  to  v  in  parallel,  each  processor  building  up  a  “vertex  list”  of  vertices  on  the  next 
level. 

A  slightly  more  involved  procedure  achieves  time  (2.2).  The  search  works  in  L  stages  (for  L 
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the  desired  number  of  levels).  For  /  =  0, . .. ,  L  —  1,  the  Pk  stage  scans  all  vertices  on  level  t  and 
places  all  newly  reached  vertices  on  level  l  +  1.  Define 

mt  =  i  vertex  v  ic  on  level  £}. 

Stage  t  uses  time  0{{ I  +  (m*  4-  mr+i)/p)logp)  time.  (Clearly  this  gives  bound  (2.2)). 

There  are  two  data  structures:  each  processor  t  has  two  linked  lists,  a  scan  list  S(i)  and  a 
vertex  list  V(i).  The  scan  lists  contain  the  edges  to  be  scanned  in  the  Ith  stage  and  the  vertex  lists 
contain  the  vertices  on  level  l  +  1.  More  precisely,  the  scan  lists  partition  the  edges  incident  to 
vertices  on  level  l.  Each  scan  list  specifies  at  most  such  edges.  An  entry  on  a  scan  list  is  a 

triple  ( v,j,k ),  which  corresponds  to  the  jtK  through  kth  edges  (inclusive)  in  the  incident  list  I{v). 
A  vertex  list  is  a  list  of  at  most  fm*/p]  vertices  on  level  l  +  1. 

Stage  l  works  in  two  parts.  The  first  part  scans  the  edges  incident  to  level  l.  Processor  :  scans 
the  edges  on  S(i).  It  adds  newly  reached  vertices  w  to  its  vertex  list  V(i).  Sorting  and  parallel 
prefix  computations  are  used  to  coordinate  the  manipulations  of  vertices  w.  Cole’s  algorithm  is 
used  to  sort  p  numbers  in  time  O(logp)  [C]. 

The  second  part  of  stage  £  uses  the  vertex  lists  to  construct  scan  lists  for  stage  t  +  1.  This  is 
done  in  two  steps.  Step  1  is  a  parallel  prefix  computation  that  calculates  several  quantities  including 
me+i.  In  Step  2,  each  processor  i  constructs  one  or  more  triples  for  each  vertex  v  6  V(i);  the  triples 
specify  how  I(v)  will  be  partitioned  among  scan  lists.  The  construction  uses  the  fact  that  scan 
list  boundaries  occur  every  fm<+i/p]  edges.  Since  any  processor  examines  at  most  \mi/p]  vertices 
and  [m*+i/p"]  boundaries,  the  time  is  as  desired. 

This  completes  the  proof  of  Theorem  1.2.  S 

Now  consider  the  problem  of  finding  a  maximum  cardinality  degree-constrained  subgraph.  Our 
implementation  of  the  Hopcroft-Karp  algorithm  easily  generalizes  to  this  problem.  (We  omit  the 
details  here.  Section  4.2  addresses  the  main  issues,  in  the  context  of  the  v.dghtei  case.) 

Corollary  2.1.  Consider  a  bipartite  multigraph,  where  all  edge  multiplicities  are  at  most  M 
( M  =  1  for  a  graph).  A  maximum  cardinality  degree-constrained  subgraph  can  be  found  in  time 
0(min{VU,  n2/3Af1/3}m/p+  min{U log  U,nV MU}) log p)  and  space  O(m). 

Proof.  The  time  is  expression  (2.1),  using  the  bounds  on  I  and  A  given  in  [ET,FM,  GaT87].  I 
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3.  The  assignment  problem. 


This  section  presents  our  parallel  algorithm  for  the  problem  of  finding  a  minimum  perfect 
matching  in  a  bipartite  graph,  proving  Theorem  1.1.  For  convenience  we  assume  the  given  graph 
G  has  a  perfect  matching.  (The  algorithms  of  Section  4.2  handle  other  versions  of  the  weighted 
matching  problem.)  Also  for  convenience,  in  this  section  n  denotes  |Vo|,  which  is  half  the  number 
of  vertices.  We  present  the  algorithm  in  a  top-down  fashion.  Section  3.1  introduces  r-optimal 
matchings  and  gives  the  basic  routines  of  the  algorithm.  The  two  major  subroutines  are  in  Sections 
3.2  and  3.3. 


3.1.  The  basic  algorithm:  r-optimality. 

Most  algorithms  for  weighted  matching,  including  ours,  use  the  linear  programming  dual  vari¬ 
ables  [Dan],  A  dual  function  is  a  function  y  :  V  — ►  Z  (for  Z  the  set  of  integers);  y(v)  is  called  the 
dual  variable  of  vertex  v.  Our  notational  convention  for  functions  (see  Section  1)  implies  the  fol¬ 
lowing  notation:  For  an  edge  e,  y(e)  -  y(e<j)  +  y(ei)  (since  precisely  speaking,  e  =  {eo,e3}). 
Similarly  if  5  is  a  set  of  edges,  y(S)  -  £{y(e)|e  €  5}.  Observe  that  if  M  is  a  matching, 
y{M )  =  53{y(u)|  vertex  v  is  matched  in  M}. 

The  Hungarian  algorithm  and  other  traditional  approaches  to  weighted  matching  are  based  on 
the  complementary  slackness  condition  for  minimum  perfect  matching  [L] :  A  perfect  matching  M 
has  minimum  cost  if  and  only  if  there  is  a  dual  function  such  that  for  any  edge  e,  y(e)  <  c(e),  with 
equality  holding  for  any  e  €  M .  We  call  such  a  dual  function  an  ( optimvr -)  linear  programming 
dual. 

Our  approach  uses  a  modification  of  linear  programming  duals.  An  r-feasible  matching  consists 
of  a  matching  A* ,  nonnegative  integer  r,  and  dual  function  y  such  that 

y(e)  <  c(e)  +  (if  e  €  Af  then  0  else  1),  e  €  E\  (3.1a) 

c(M)  <  y(M)  +  r.  (3.1b) 

An  r~optimal  ( relaxed-optimal)  matching  is  a  perfect  matching  that  is  r-feasible. 

To  motivate  this  definition,  first  observe  that  dropping  the  if  term  from  (3.1a)  and  setting 

r  =  0  gives  the  linear  programming  duals.  Now  put  back  the  if  term  but  keep  r  =  0.  Tlus 

% 

notion  is  1-optimality,  used  in  [GaT87]  to  design  an  efficient  sequential  algorithm  for  minimum 
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perfect  matching.  The  intuition  is  that  the  if  term  makes  the  cost  of  augmenting  paths  reflect 
their  length  (each  unmatched  edge  contributes  1  to  the  path  length,  and  an  extra  1  to  the  path 
cost;  in  the  '■'•'-uext  of  scaling  the  extra  1  is  significant,  since  costs  are  small).  Because  of  this 
the  alg^r  .^m  tends  to  augment  along  paths  of  short  length,  as  in  the  Hopcroft-Karp  cardinality 
matching  algorithm.  1-optimality  is  similar  to  the  notion  of  c-optimal  flows  in  the  minimum  cost 
flow  algorithm  of  [GoT].  The  parallel  algorithm  presented  here  achieves  the  same  asymptotic  time 
as  [GaT87]  when  it  runs  on  one  processor. 

The  notion  of  1-optimality  does  not  seem  to  lead  to  an  efficient  parallel  algorithm.  1-optimality 
guarantees  a  low  bound  on  the  total  length  of  all  augmenting  paths,  but  some  augmenting  paths 
can  still  be  long.  This  implies  that  an  algorithm  must  explore  long  candidate  augmenting  paths, 
which  seems  hard  to  do  efficiently  in  parallel.  We  use  r-optimality  to  overcome  this  difficulty: 
r-optimality  guarantees  a  low  bound  on  the  total  length  of  all  augmenting  paths  (Lemma  3.5)  and 
also  gives  the  algorithm  the  flexibility  to  invalidate  long  candidate  augmenting  paths. 

We  now  develop  the  properties  of  r-optimality,  and  at  the  same  time  state  the  basic  algorithm. 
We  start  with  the  relation  between  r-optimal  matchings  and  minimum  perfect  matchings;  similar 
results  are  in  [GoT],  [GaT87]. 

Lemma  3.1.  If  some  integer  larger  than  r  +  n  divides  each  cost  c(e)  evenly,  then  any  r-optimal 
matching  is  a  minimum  perfect  matching. 

Proof.  Consider  a  perfect  matching  P.  It  suffices  to  show  that  c(M)  <  c(P)  +  r  +  n,  since  c(M) 
and  c(P)  are  both  multiples  of  the  integer  hypothesized  in  the  lemma.  From  (3.16),  c{M)  <  y(V)+r; 
from  (3.1a),  y(V)  <  c(P)  +  n.  Combining  these  gives  the  desired  inequality.  I 

The  algorithm  is  stated  using  three  integer  paramjters, 

r,  6  =  3r  +  5  n,  g. 

The  value  of  these  parameters  is  chosen  in  Section  3.2  (specifically  we  choose  r, b  =  0(n),  g  = 
@(logn);  r  is  the  parameter  for  r-optimality). 

The  main  routine  of  the  algorithm  scales  the  costs.  It  first  computes  a  new  cost  c(e)  for  each 
edge  e,  equal  to  r  +  n  +  1  times  the  given  cost.  Consider  each  c(e)  to  be  a  signed  binary  number 
±6162  . . .  6^  of  k  =  [log (r  +  n  +  l)fVJ  +  1  bits.  The  routine  maintains  a  variable  c(e)  for  each  edge 
e,  equal  to  its  cost  in  the  current  scale.  The  routine  initializes  each  c(e )  to  0  and  each  y{v)  to  0. 
Then  it  executes  the  following  loop  for  index  s  going  from  1  to  k: 
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Double  Step.  For  each  edge  e,  c(e)  *-  2c(e)  +  (signed  bit  6,  of  c(e)).  For  each  vertex  v, 
y{v)  «-  2y(t>)  -  1. 

Match  Step.  Call  the  scale-match  routine  to  find  an  r-optimal  matching  for  costs  c(e).  I 

Lemma  3.1  shows  that  the  main  routine  halts  with  a  minimum  perfect  matching.  Each  iteration 
of  the  loop  is  called  a  scale.  Clearly  the  total  time  is  <3(log((r  +  n)N))  times  the  time  for  one 
scale.  Note  that  the  entire  algorithm  runs  in  the  desired  time  bound  if  each  scale  runs  in  time 

0{(y/nm/p  +  nlog2n)logp).  (3.2) 

This  follows  since  as  noted  above  we  will  choose  r  =  0(n).  The  time  for  the  Double  Step  is  0{m/p). 

The  scale-match  routine  transforms  costs  so  that  they  are  small  integers.  (This  is  for  conceptual 
convenience.)  It  changes  the  cost  of  each  edge  e  to  c(e)  -  y(e);  then  it  calls  the  match  routine  on 
these  costs  to  find  an  r-optimal  matching  M  with  duals  y';  then  it  constructs  the  new  dual  function 
y  -r  j/ ,  where  y  is  the  dual  function  before  the  call  to  match.  The  time  for  these  transformations  is 
0(m/p+  logp)  (a  parallel  prefix  computation  is  used  to  broadcast  dual  values  y(r);  each  processor 
uses  at  most  two  duals  that  another  processor  uses). 

Clearly  when  scale-match  terminates,  M  with  the  new  duals  is  an  r-optimal  matching  for  cost 
function  c.  Furthermore,  the  costs  that  scale-match  inputs  to  match  have  these  properties: 

(a)  The  costs  are  integers  -1  or  larger. 

(b)  There  is  a  perfect  matching  of  cost  at  most  2r  +  3 n. 

Property  (a)  follows  from  the  fact  that  the  Double  Step  changes  costs  and  duals  so  that  each  edge 
e  has  y(e)  —  1  <  c(e).  Next  we  show  that  M,  the  r-optimal  matching  found  in  the  previous  scale, 
satisfies  property  (6)  ((6)  is  obvious  in  the  first  scale).  For  any  edge  e  E  M,  let  p(e)  be  the  value 
e(e)  —  y(e)  from  the  previous  scale.  After  the  Double  Step,  2 p(e)  -f  3  >  c(e)  -  y(e).  Hence  e  costs 
at  most  2 p{e)  F  3  in  the  costs  for  match.  The  conclusion  for  M  follows. 

In  the  match  routine,  an  edge  e  is  eligible  if  it  is  matched  or  constraint  (3.1a)  holds  with 
equality.  The  match  routine  augments  the  matching  along  paths  of  eligible  edges.  (To  motivate 
this,  think  of  (3.1a)  as  placing  a  lower  bound  on  c(e).  Then  an  unmatched  eligible  edge  has  smallest 
cost  possible,  and  so  using  it  in  an  augmenting  path  is  desirable.)  If  there  is  no  augmenting  path 
of  eligible  edges,  match  adjusts  the  duals  to  create  one.  More  precisely  match  works  as  follows. 


procedure  match. 


Initialize  all  duals  y(v)  to  0  and  matching  M  to  0.  Then  repeat  the  following  steps  until  the 
Augment  Step  halts  with  the  desired  r-optimal  matching. 

Augment  Step.  Find  a  maximal  set  A  of  vertex-disjoint  augmenting  paths  of  eligible  edges.  For 
each  path  P  €  A,  augment  the  matching  along  P,  and  for  each  vertex  w  €  Vj(P),  decrease  y(w ) 
by  1.  If  the  new  matching  M  is  perfect,  halt. 

Search  Step.  Do  a  Relaxed  Hungarian  Search  (see  below)  to  adjust  the  duals,  maintaining  r- 
feasibilny,  and  create  an  augmenting  path  of  eligible  edges.  I 

To  analyze  match ,  we  must  first  give  some  details  of  the  Search  and  Augment  Steps  (the 
Search  Step  is  described  complete!,  in  Section  3.2;  the  Augment  Step  is  in  Section  3.3).  The 
Relaxed  Hungarian  Search  is  a  modification  of  the  Hungarian  search  done  in  bipartite  matching 
(the  latter  is  essentially  Dijkstra’s  shortest  path  algorithm  [L,T]).  The  Relaxed  Hungarian  Search 
changes  dual  values  in  two  wavs;  dual  adjustments,  which  are  also  done  in  the  ordinary  Hungarian 
search,  and  relax  operations,  which  are  new.  Each  dual  adjustment  calculates  a  positive  integer  S 
and  increases  or  decreases  various  dual  values  by  6,  so  as  to  preserve  r- feasibility  and  eventually 
create  an  augmenting  p^th  of  eligible  edges.  A  relax  operation  does  not  create  any  eligible  edges. 

At  any  point  in  match  define 

/  =  the  number  of  free  vertices  in  Vo; 

A  =  the  sum  of  all  dual  cjustment  quantities  6  in  all  Hungarian  searches  so  far. 

(A  is  defined  with  respect  to  the  current  execution  of  match.)  The  duals  are  maintained  so  that 
any  free  vertex  v  has 

y(v)  =  if  v  €  Vo  then  A  else  0.  (3.3) 

Now  we  analyze  maich.  First  observe  that  it  is  correct,  specifically:  (t)  it  maintains  r-feasibility, 
and  (ti)  it  halts  with  M  an  r-optimal  matching.  Property  (t)  holds  after  the  initialization  (by 
property  (a)  of  the  costs  for  match).  It  is  part  of  the  specification  of  the  Relaxed  Hungarian 
Search.  Hence  we  need  only  consider  an  Augment  Step.  It  decreases  duals  so  that  y(e)  =  c(e) 
for  every  newly  matched  edge  e.  This  implies  that  (3.1a)  holds.  It  also  implies  (3. It)  (since  every 
previously  matched  edge  satisfied  y(e)  <  c(e)).  Now  consider  property  (it).  If  M  is  not  perfect  but 
G  has  a  perfect  matching,  the  Search  Step  creates  an  augmenting  path  of  eligible  edges.  Hence  (it) 
eventually  holds.  (If  G  does  not  have  a  perfect  matching,  this  is  eventually  detected  in  the  Search 
Step.) 
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The  efficiency  analysis  starts  with  a  fact  similar  to  the  key  result  in  the  analysis  of  the  Hopcroft- 
Karp  algorithm. 

Lemma  3.2.  At  any  time  during  match ,  /A  <  b. 

Proof.  At  any  point  in  match  let  M  be  the  current  matching,  and  let  M  *  be  a  minimum  perfect 
matching.  Consider  the  expression 

y  =  y(M’)  -  y(M). 

M“  ©  M  consists  of  alternating  cycles  plus  exactly  /  augmenting  paths.  Hence 
Y  =  y({t/|t/  is  free  in  M })  =  /A,  by  (3.3).  On  the  other  hand  (3.1)  implies  Y  <  c(M’)+n-c(M)  + 
r.  Properties  (a)-(6)  of  the  costs  for  match  imply  that  the  last  expression  is  at  most  3r  +  5n  =  6. 1 

Define  the  quantities  I  and  A  as  in  Section  2.  In  the  definition  of  A,  measure  the  length  of  an 
alternating  path  by  the  number  of  unmatched  edges.  For  1  <  »  <  n  define 

Aj  =  the  value  of  A  during  the  itK  augmentation. 

(The  ith  augmentation  is  when  match  augments  along  the  ith  augmenting  path.)  These  quantities 
are  bounded  as  follows. 

Lemma  3.3.  I<2y/b+l. 

Proof.  First  we  show  that  a  Hungarian  search  S  increases  A  by  at  least  one.  It  suffices  to  show 
that  S  does  a  dual  adjustment  (since  any  dual  adjustment  quantity  6  is  a  positive  integer).  Search 
S  does  a  dual  adjustment  unless,  when  it  starts,  there  is  an  augmenting  path  P  of  eligible  edges. 
Clearly  P  intersects  some  augmenting  path  of  A  of  the  preceding  Augment  Step.  It  is  easy  to  see 
that  P  contains  an  unmatched  edge  e,  such  that  ei  but  not  «o  is  in  an  augmenting  path  of  A.  But 
e  is  ineligible  after  the  Augment  Step  decreases  y(e j).  Thus  P  does  not  exist,  and  S  does  a  dual 
adjustment. 

This  implies  that  at  most  \/h  Search  Steps  end  with  A  <  %/6.  If  a  Search  Step  ends  with 
A  >  Vb  then  /  <  y/b  by  Lemma  3.2.  There  can  be  at  most  /  more  iterations,  since  each  Augment 
Step  enlarges  the  matching.  I 


Lemma  3.4.  £?=i  ^  &  +  hlogn. 


Proof.  Since  /  =  n  — i+1  right  before  the  itk  augmentation,  Lemma  3.2  implies  A i  <  6/(n-»+l). 

I 


Lemma  3.5.  A  <  n  +  b  +  hlogn. 

Proof.  Let  Pj  be  the  path  used  in  the  itk  augmentation.  Let  Li  be  its  length,  measured  as  the 
number  of  unmatched  edges;  let  Mi  be  the  matching  after  augmenting  along  P,.  (Note  that  Mo  =  0 
and  c(Mq)  =  0.)  Using  (3.3),  the  definition  of  “eligible”,  and  (3.1a), 

=  y(Pi  -  Mi -0  -  y(Pi  n  A/,-i)  >  l i  +  c(Mi)  -  c(Mi- 1). 

Summing  these  inequalities  implies  A,  >  A  -  n  (by  property  (a),  c(M„)  >  — n).  Now  Lemma 
3.4  implies  the  desired  bound.  I 


3.2.  Relaxed  Hungarian  Search. 

This  section  first  describes  ordinary  Hungarian  search,  modified  to  accommodate  the  concepts 
of  our  paper  —  eligible  edges  and  r-feasiblity.  This  version  of  Hungarian  Search  is  what  is  needed 
in  an  efficient  one-processor  algorithm.  The  remainder  of  the  section  describes  Relaxed  Hungarian 
Search  and  presents  its  analysis. 

Ordinary  Hungarian  search  has  two  main  components,  the  search  forest  T  and  the  dual  ad¬ 
justment  operation.  Recall  that  the  purpose  of  a  Hungarian  search  is  to  create  an  augmenting  path 
of  eligible  edges,  by  adjusting  the  duals  in  a  way  that  preserves  r-feasibility.  The  augmenting  path 
is  found  by  growing  a  forest  T .  The  roots  of  7  axe  the  free  vertices  of  VJ>;  any  path  from  a  vertex 
to  a  root  in  7  is  an  alternating  path  of  eligible  edges.  Hence  when  7  contains  a  free  vertex  of  V\ 
it  contains  the  desired  augmenting  path. 

If  a  maximal  forest  7  does  not  contain  an  augmenting  path,  a  dual  adjustment  can  be  done. 
Define  the  dual  adjustment  quantity 


b  =  min{c(e)  +  1  -  y(e) |e<>  6  7,  ej  g  7). 


Each  v  €  7  has  its  dual  y(v)  increased  by  6x  (if  v  6  Vo  then  1  else  -1).  This  adjustment 
preserves  r-feasibility,  since  it  does  not  change  y(e)  when  e  has  both  vertices  in  7 .  Furthermore, 
any  edge  e  achieving  the  above  minimum  becomes  eligible,  and  can  be  added  to  7 ' . 

The  Hungarian  search  alternates  between  growing  7  and  doing  dual  adjustments.  Specifically, 
7  is  grown  until  it  is  maximal:  An  eligible  edge  t  with  eo  6  7  and  t\  7  is  added  to  7  whenever 
possible;  if  t\  is  not  free,  its  matched  edge  t\t\  is  also  added  to  7 .  If  the  maximal  7  does  not 
contain  an  augmenting  path,  a  dual  adjustment  is  done.  Then  the  process  repeats.  Eventually 
7  contains  the  desired  augmenting  path  of  eligible  edges,  at  which  point  the  ordinary  Hungarian 
search  halts. 

The  ordinary  Hungarian  search  is  adequate  for  p  =  1  (it  is  used  in  the  one-processor  algorithm 
of  [GaT87]).  It  is  not  efficient  for  our  approach  to  parallel  processing,  however,  for  the  following 
reason.  As  illustrated  in  Section  2,  our  approach  charges  search  time  to  augmenting  path  length. 
But  ordinary  Hungarian  search  leads  to  two  circumstances  where  search  time  can  be  much  longer 
than  augmenting  path  length:  First,  a  search  might  grow  a  forest  7  with  long  paths,  yet  after 
dual  adjustments,  find  a  short  augmenting  path.  Second,  when  the  search  halts  there  may  be  long 
alternating  paths  of  eligible  edges  that  the  Augment  Step  must  explore,  yet  these  paths  may  not 
lead  to  any  augmentations. 

The  Relaxed  Hungarian  Search  remedies  this  using  the  relax  operation.  To  relax  a  set  of 
matched  vertices  5  C  V0  means  to  decrease  y(v)  by  1  for  each  v  £  S-  The  relax  operation  makes 
every  unmatched  edge  incident  to  S  ineligible.  Concerning  r-feasibility,  note  that  a  relax  operation 
preserves  (3.1a).  It  decreases  y(M)  by  |5|,  so  (3.1b)  places  a  limit  on  relax  operations. 

Relax  operations  can  be  used  to  overcome  the  above  two  difficulties,  as  follows.  First  the 
algorithm  can  limit  the  time  to  grow  7:  If  a  parallel  step  adds  just  a  small  number  of  vertices  to 
7 ,  the  algorithm  relaxes  those  vertices,  preserving  (3.1  b),  yet  cutting  off  the  growth  of  7.  Second, 
after  the  parallel  Hungarian  search  finds  an  augmenting  path,  there  may  still  be  eligible  edges  to 
add  to  7.  The  algorithm  continues  to  add  vertices  to  7  in  parallel,  until  some  parallel  step  adds 
just  a  small  number  of  vertices.  At  that  point  the  algorithm  relaxes  those  vertices,  cutting  off 
further  growth  as  desired.  We  shall  see  that  these  two  remedies  lead  to  an  efficient  algorithm. 

Before  presenting  the  algorithm  in  detail  note  that  the  following  modification  of  the  relax 
operation  might  be  more  efficient  in  practice:  decrease  y(v)  only  if  v  £  S  is  incident  to  an  unmatched 
eligible  edge.  Our  analysis  applies  without  change  to  this  modification.  For  definiteness,  the  rest 
of  the  paper  assumes  that  the  simpler  relax  operation  given  above  is  used. 

v 

Now  we  describe  Relaxed  Hungarian  Search.  The  search  initializes  the  search  forest  7  to 
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contain  the  free  vertices  of  V0.  Then  it  repeats  the  following  two  steps  until  the  Adjust  Step  halts 
with  T  as  desired.  (Recall  that  /  denotes  the  number  of  free  vertices  in  V0  and  g  is  a  parameter  of 
the  algorithm.) 

Adjust  Step.  Set  Wi  *-  {ej|  some  eligible  edge  e  has  eo  €  F,  Ci  t  f),  Wo  *—  {eo|ej  €  Wi,  e  € 
M }.  If  W\  =  0  and  T  contains  a  free  vertex  of  Vj,  halt.  If  Wj  =  0  and  T  does  not  contain  a  free 
vertex  of  Vj,  do  a  dual  adjustment  and  repeat  this  step. 

Grow  Step.  For  each  vertex  w  £W\,  add  an  eligible  edge  vw  (v  €  ?)  to  jF ,  and  if  w  is  not  free, 
add  the  matched  edge  ww'  to  T .  If  |W0|  <  f /g  then  relax  Wo-  I 

Let  us  clarify  the  flow  of  control  in  this  algorithm.  First  consider  the  Adjust  Step.  A  dual 
adjustment  in  this  step  is  well-defined,  since  it  is  done  only  when  T  does  not  contain  a  free  vertex 
of  Vj.  A  dual  adjustment  ensures  that  the  next  Adjust  Step  has  W\  0;  hence  the  Adjust  Step 
repeats  at  most  once. 

Next  consider  the  Grow  Step.  After  it  adds  edges,  the  eligible  edges  L  joining  a  vertex  of  Vq(T) 
to  a  vertex  not  in  T  are  all  incident  to  Wo-  If  |Wo|  >  f/g,  the  next  Adjust  Step  and  Grow  Step 
process  these  edges  L  (if  L  ^  0).  If  \W0\  <  f/g  the  relax  operation  makes  the  edges  L  ineligible; 
hence  the  next  Adjust  Step  either  halts  or  does  a  dual  adjustment. 

We  define  two  more  quantities  for  the  analysis: 

R  =  the  total  decrease  in  duals  caused  by  relax  operations; 

H  =  the  total  number  of  iterations  in  all  Hungarian  searches. 

Here  an  iteration  is  defined  as  an  execution  of  an  Adjust  Step  plus  the  following  Grow  Step  (if 
it  exists).  Both  quantities  are  defined  with  respect  to  the  current  execution  of  match.  We  shall 
choose  the  parameter  r  to  be  an  upper  bound  on  R. 

The  correctness  of  the  Relaxed  Hungarian  Search  amounts  to  these  properties:  (»)  it  preserves 
(3.3);  (tt)  it  preserves  r-feasibility;  (tit)  it  eventually  halts  having  created  an  augmenting  path  of 
eligible  edges. 

For  (t),  the  dual  of  a  free  vertex  v  changes  only  in  a  dual  adjustment.  If  v  €  Vo  then  every 
dual  adjustment  increases  g(v),  so  y(v)  =  A.  If  v  €  Vj  then  no  dual  adjustment  changes  y(v),  so 
y(v)  =  0. 

Property  (tt)  was  essentially  verified  in  the  above  discussion.  For  (3.16),  we  have  observed  that 
a  dual  adjustment  does  not  change  y(M);  an  Augment  Step  does  not  increase  c(M)  -  y(M).  Relax 
operations  decrease  y(M).  Hence  our  choice  of  r  will  guarantee  r-feasibility. 
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For  (in),  as  already  noted  an  Adjust  Step  repeats  at  most  once.  Then  a  Grow  Step  with 
Wi  0  is  executed.  Hence  every  iteration  of  Adjust  and  Grow  enlarges  F.  Thus  the  routine 
eventually  halts.  When  it  halts,  ?  contains  a  free  vertex  of  Vj.  Hence  T  contains  an  augmenting 
path  of  eligible  edges.  (Note  that  relax  operations  do  not  destroy  the  eligibility  of  edges  in  T .) 

We  establish  two  other  properties  that  are  needed  by  the  Augment  Step  (Section  3.3).  The 
first  is  that  T  contains  all  vertices  that  are  on  an  augmenting  path  of  eligible  edges.  This  follows 
since  the  search  halts  with  Wj  =  0. 

For  the  second  property,  first  recall  that  the  Augment  Step  of  the  cardinality  matching  algo¬ 
rithm  relies  on  the  fact  that  the  level  graph  is  layered.  In  minimum  cost  matching  the  graph  of 
eligible  edges  is  not  layered.  This  makes  the  Augment  Step  more  difficult.  The  eligible  edges  have 
the  following  weaker  property  (similar  to  [GoT]  for  network  flow). 

Lemma  3.6.  In  match  there  is  never  an  alternating  cycle  of  eligible  edges. 

Proof.  We  proceed  by  contradiction.  Suppose  there  is  an  alternating  cycle  of  eligible  edges  C.  C 
does  not  exist  after  the  initialization  of  match,  since  there  are  no  matched  edges. 

C  is  not  created  in  a  Relaxed  Hungarian  Search,  for  the  following  reasons:  A  relax  operation 
does  not  create  an  eligible  edge,  so  it  does  not  create  C.  A  dual  adjustment  does  create  eligible 
edges  e  £  M ,  where  eo  €  T,  e\  £  T.  If  C  contains  such  an  edge,  it  also  contains  an  edge  /  ^  M 
with  fo  £  T,  fi  €  T.  But  /  is  ineligible  after  the  dual  adjustment. 

Similar  reasoning  applies  when  the  Augment  Step  creates  new  matched  edges  and  changes 
duals.  I 

Now  we  analyze  the  efficiency  of  the  Relaxed  Hungarian  Search. 

Lemma  3.7.  H  —  0(b  +  gn  logn). 

Proof.  There  are  three  possibilities  for  an  iteration:  (i)  It  adds  at  least  f/g  vertices  to  Vo{J~). 
(it)  It  is  the  last  or  next-to-last  iteration  in  its  Hungarian  search,  (tit)  The  next  iteration  does  a 
dual  adjustment.  These  possibilities  are  exhaustive  since  if  (i)  does  not  hold  and  the  Adjust  Step 
does  not  halt,  the  Grow  Step  does  a  relax  operation,  making  Wi  =  0  in  the  next  iteration. 

Possibility  (it)  occurs  O(Vb)  times  by  Lemma  3.3.  Possibility  (iii)  occurs  at  most  b  times, 
since  Lemma  3.2  implies  that  the  number  of  dual  adjustments  is  at  most  b.  Possibility  (i)  clearly 
occurs  at  most  gn/f  times  in  a  given  search.  Each  Hungarian  search  has  a  distinct  value  of  /, 
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since  each  Augment  Step  after  the  first  does  at  least  one  augment.  Thus  (»)  occurs  less  than 
EJ.1  gnf f  =  0(gn  log  n)  times,  as  desired.  I 

Lemma  3.8.  R  <  (4blogn)/g. 

Proof.  Consider  a  Search  Step  that  starts  with  /  free  vertices,  whose  dual  adjustment  quantities 
6  sum  to  some  value  d.  The  relax  operations  cause  a  total  decrease  in  duals  of  at  most  2  fd/g.  To 
see  this,  observe  that  a  relax  operation  that  is  not  the  last  is  followed  by  a  dual  adjustment.  Hence 
there  are  at  most  d  +  1  relax  operations,  that  decrease  duals  by  at  most  (d  +  1  )f/g  <  2 df / g . 

Thus  R  <  2 fd/g,  where  the  summation  is  over  all  Hungarian  searches.  Observe  that  fd 

(summation  over  all  Hungarian  searches)  is  precisely  £"=1  A <.  This  follows  since  the  duals  of  free 
vertices  are  changed  only  by  dual  adjustments.  Now  Lemma  3.4  implies  the  desired  bound.  I 

The  lemma  and  the  definition  of  b  imply  that  r  can  be  chosen  to  be  any  value  satisfying  the 
inequality  r  >  4(3r  +  5n)logn/<7.  Hence  choose 

r  =  2n,  b  =  lln,  g  =  24flognj. 

This  implies  that  the  number  of  scales  is  0(  log  (nfV)),  and 

I  =  0(\/n),  A  =  0(nlogn),  H  =  0(n  log2n). 

In  the  timing  analysis  we  have  assumed  that  all  arithmetic  operations  use  0(1)  time.  To  justify 
this  we  show  that  each  dual  y(v)  has  magnitude  0(n2N).  Since  the  input  requires  a  word  size  of 
at  least  max{logjV,  logn}  bits,  the  dual  variables  can  be  stored  in  at  worst  triple-word  integers. 

To  show  this  first  define  Y,  as  the  largest  magnitude  of  a  dual  value  y(i>),  v  £  V0 ,  during 
the  sth  scale,  match  increases  y(v)  by  at  most  A  <  b,  and  decreases  it  by  at  most  r  <  b.  Thus 
Y.  <  2 Y,  +  6-1,  and  Yo  =  0.  Hence  Y,  <  (2*  —  1)(6  -  1)  =  0(n2fV).  Hence  the  duals  of  Vi,  satisfy 
the  desired  bound.  When  match  changes  the  dual  of  a  vertex  v  £  Vj,  it  preserves  the  relation 
y(vv')  =  c(w'),  for  vv'  £  M.  So  the  duals  of  Vj  satisfy  the  desired  bound. 

It  remains  to  describe  the  parallel  implementation  of  the  Relaxed  Hungarian  Search.  It  can 
be  implemented  so  that  the  total  time  for  all  searches  is 

0((/m/p  +  6  +  H)logp)  (3.4) 

which  is  within  the  desired  bound  (3.2).  Here  we  outline  the  ideas  of  the  parallel  implementation, 
leaving  details  to  the  reader. 
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First  as  in  the  Hungarian  algorithm,  it  is  most  efficient  to  keep  track  of  the  dual  variables 
by  offsets.  That  is,  the  algorithm  maintains  the  value  of  A.  When  a  vertex  t;  is  added  to  7,  the 
current  values  of  A  and  y(v)  are  recorded  as  A°(t>)  and  y°(v),  respectively.  At  any  later  time  the 
value  of  y(v)  can  be  computed  as  y°(»)  +  (A  -  A°(t>))  x  (if  v  €  Vo  then  1  else  —  1).  In  a  dual 
adjustment  the  value  A  gets  changed,  but  no  work  need  be  done  for  the  vertices  in  7. 

Second,  the  algorithm  maintains  lists  of  edges  that  may  be  used  in  future  Grow  Steps.  The 
idea  is  similar  to  one  proposed  by  Dial  for  Dijkstra’s  algorithm  ([Dia];  see  also  [G85],  [W]).  For 
one  processor  the  details  are  as  follows.  Each  unmatched  edge  t  with  eo  but  not  t\  in  7  has  a 
value  A(e),  such  that  if  A  reaches  the  value  A(e)  and  ej  is  still  not  in  7 ,  then  edge  c  has  become 
eligible  and  ei  can  be  added  to  7.  The  value  A(e)  is  easily  calculated  when  eo  is  added  to  7 as 
A(e)  =  A  +  c(e)  +  1  —  y(e).  For  each  possible  value  A  (0  <  A  <  b)  the  algorithm  maintains  a  list 
L( A)  containing  all  edges  e  with  A(e)  =  A.  In  the  Adjust  Step  the  set  is  found  by  examining  the 
edges  e  on  L( A)  (for  the  current  value  of  A).  A  dual  adjustment  is  done  by  repeatedly  increasing 
A  by  one  until  X(A)  contains  an  edge  e  with  ei  ^  V(7).  In  the  Grow  Step  each  edge  e  incident  to 
Wo  is  added  to  the  appropriate  list  L( A(e)).  Note  that  A(e)  is  calculated  after  any  relax  operation, 
and  there  is  no  need  to  add  e  if  A(e)  >  6. 

To  implement  this  in  parallel  we  allocate  storage  in  “blocks”  to  allow  edges  on  a  list  1(A)  to 
be  scanned  in  parallel.  Specifically  a  block  consists  of  p  +  1  consecutive  words  of  memory.  Define 
a  =  |Vo|.  The  algorithm  allocates  space  for  s  +  1+  [m/pj  blocks.  The  total  space  for  this  is  O(m), 
since  sp  <  m  by  assumption  on  the  number  of  processors.  Blocks  are  used  as  needed.  The  unused 
blocks  are  consecutive  in  memory.  The  algorithm  keeps  a  pointer  to  the  first  unused  block  so  that 
the  next  block  can  be  allocated  when  needed. 

The  algorithm  uses  a  system  of  lists  L'(d),  for  0  <  d  <  a.  Each  list  L'(d)  consists  of  one  or 
more  blocks.  Each  block  stores  k  edges  in  its  first  k  words,  for  some  k  <  p;  if  k  =  p  then  the  p+ 
word  of  the  block  points  to  the  next  block  of  L'(d).  If  A  is  in  the  range  is  <  A  <  (t  +  l)s  then  for 
9  <  d  <  a,  list  L'(d)  corresponds  to  the  above  list  L(is  +  d);  list  L'(s)  corresponds  to  the  union  of 
all  lists  L(ia  +  d),  d  >  a.  Since  at  any  time  a  given  edge  is  in  at  most  one  block,  the  number  of 
blocks  used  is  as  given  above. 

This  data  structure  allows  the  edges  of  a  list  to  be  scanned  in  parallel.  Specifically  to  scan  the 

edges  in  a  block,  processor  i  accesses  the  edge  stored  in  the  ith  word  of  the  block.  Similarly  edges 

are  added  to  a  list  in  parallel  by  placing  them  in  consecutive  locations  at  the  end  of  the  last  block. 

If  this  last  block  does  not  have  sufficient  space  a  new  block  is  allocated  and  added  to  the  list. 

% 

Using  sorting  and  parallel  prefix  computations  as  in  Section  2,  the  time  ibi  all  Adjust  and 
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Grow  Steps  is  given  by  (3.4). 

The  last  detail  concerns  the  processing  when  A  takes  on  a  value  that  is  a  multiple  of  s.  Note 
from  the  definition  of  the  data  structure  that  at  this  point  all  lists  L'(d),  0  <  d  <  s,  must  be 
reinitialized,  using  edges  currently  on  L'{s).  By  Lemma  3.2,  this  reinitialization  occurs  at  most 
f6/sl  <  jVf>l  times.  Using  sorting  and  parallel  prefix  computations  the  total  time  for  this  is 
0(\/nTn(\ogp)/p),  as  desired. 


3.3.  The  Augment  Step. 

This  section  describes  the  Augment  Step  of  match.  Consider  the  Augment  Step  for  some  value 
A.  Let  A&  denote  the  total  augmenting  path  length  in  this  Augment  Step.  The  algorithm  of  this 
section  uses  time 

0((m/p+  AA)Iogp).  (3.5) 

The  bounds  on  I  and  A  together  with  (3.5)  imply  that  the  total  time  for  all  Augment  Steps  is  less 
than  the  desired  bound  (3.2). 

The  Augment  Step  works  on  the  residual  graph  of  the  graph  of  eligible  edges.  This  directed 
graph  is  acyclic,  by  Lemma  3.6.  Thus  it  is  easy  to  see  that  the  Augment  Step  amounts  to  an 
algorithm  for  the  following  problem:  Given  a  directed  acyclic  graph  D  with  distinguished  vertices 
s  and  t,  find  a  maximal  set  A  of  vertex  disjoint  at-paths.  This  is  the  same  problem  as  in  Section 
2,  but  now  the  graph  is  acyclic  instead  of  layered.  We  present  an  algorithm  for  this  problem. 

For  any  graph  D  as  in  our  problem,  its  reduction  R  is  the  subgraph  induced  by  the  vertices  that 
are  on  at-paths.  Equivalently  R  is  the  maximal  subgraph  of  D  such  that  every  vertex  except  t  has 
positive  outdegree  and  every  vertex  except  s  has  positive  indegree.  (Here  indegree  and  outdegTee 
refer  to  degrees  in  R.  Section  2  uses  a  weaker  notion  of  reduction.)  Note  that  for  any  vertex  v  of 
R,  a  ut-path  in  R  can  be  found  by  starting  at  v  and  repeatedly  traversing  an  edge  from  the  most 
recently  reached  vertex.  An  sv-path  can  be  found  similarly. 

As  in  Section  2,  for  a  subgraph  H,  V(H)  stands  for  V(H)  —  {s,t}. 

The  algorithm  maintains  the  graph  R  as  the  reduction  of  D—  V(A).  As  in  the  Augment  Step 
of  Section  2,  the  algorithm  repeatedly  finds  an  st-path  P,  adds  it  to  A,  and  updates  R  by  deleting 
V  (P)  and  all  vertices  whose  indegree  or  outdegree  drops  to  zero.  The  difficulty  in  this  approach  is 
that  the  vertex  deletion  time  can  be  excessive.  To  see  why,  observe  that  the  time  to  delete  vertices 
for  P  is  at  least  (a  constant  times)  the  length  of  any  path  Q  of  deleted  vertices.  In  Section  2,  R 
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is  layered,  so  |<3|  <  |P|.  This  gives  an  acceptable  bound  on  vertex  deletion  time.  When  R  is  not 
layered,  ]<?!  can  be  larger  than  \P\  —  we  know  no  bound  on  )Q|  except  n  —  1.  Thus  the  vertex 
deletion  time  might  exceed  the  desired  time  bound. 

The  algorithm  overcomes  this  difficulty  with  the  following  approach,  based  on  doubling.  The 
algorithm  starts  with  a  candidate  path  P.  It  determines  the  effect  of  adding  P  to  A  by  tentatively 
deleting  V(P)  and  other  vertices  as  appropriate.  It  checks  if  the  time  to  do  this  is  acceptable.  If 
not,  it  uses  tentatively  deleted  edges  to  construct  a  new  st-path,  over  twice  as  long  as  P.  It  repeats 
the  process  for  the  new  path.  Eventually  an  acceptable  path  is  found  and  added  to  A. 

This  strategy  is  implemented  in  the  algorithm  fincLpath  below,  fincLpath  is  called  on  a  graph 
R,  the  current  reduction  of  D  —  V  (.4).  Its  purpose  is  to  add  one  path  to  A  and  update  R.  The 
In  and  Out  Steps  below  estimate  the  deletion  time  by  tentatively  deleting  vertices  from  R.  These 
tentative  deletions  are  either  made  permanent  in  the  Double  Step,  or  are  ignored.  Throughout  this 
section,  “tentatively  deleting”  a  vertex  or  edge  means  tentatively  deleting  it  from  R. 

procedure  fincLpath. 

Initialize  P  to  be  an  arbitrary  st-path.  Then  repeat  the  following  steps  until  the  Double  Step  adds 
the  desired  path  to  A. 

In  Step.  Tentatively  delete  all  edges  directed  from  V  (P).  Then  tentatively  delete  any  vertex 
whose  indegree  has  dropped  to  zero;  repeat  this  until  every  vertex  of  R  -  s  has  positive  indegree. 
Let  pi  be  the  total  number  of  edges  tentatively  deleted  in  this  step.  Let  P-  be  a  longest  path  of 
edges  tentatively  deleted  in  this  step. 

Out  Step.  Tentatively  delete  all  edges  directed  to  V_(P).  Then  tentatively  delete  any  vertex 
whose  outdegree  has  dropped  to  zero;  repeat  this  until  every  vertex  of  R  —  t  has  positive  outdegree. 
Let  pa  be  the  total  number  of  edges  tentatively  deleted  in  this  step.  Let  P be  a  longest  path  of 
edges  tentatively  deleted  in  this  step. 

Double  Step.  Set  p  *-  pi  +  p0.  Let  P'  be  the  longer  path  of  PI,  P'0.  If  \P'\  <  2(|P|  +  p/p) 
then  make  the  deletions  of  the  In  and  Out  Steps  permanent,  delete  V_(P),  add  P  to  A  and  halt. 
Otherwise  ignore  those  tentative  deletions;  let  5  be  a  path  from  s  to  the  first  vertex  of  P';  let  T 
be  a  path  from  the  last  vertex  of  P'  to  t ;  let  P  be  the  st-path  formed  by  S ,  P'  and  T.  I 

The  correctness  of  fincLpath  amounts  to  the  fact  that  if  the  routine  is  called  with  R  a  nonempty 

v 

reduction  graph,  it  eventually  adds  an  st-path  to  A  and  halts.  In  the  initialization,  path  P  exists 
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6ince  R  is  a  nonempty  reduction  graph.  Similarly,  in  the  Double  Step  paths  5  and  T  exist.  Thus 
every  iteration  of  find-path  constructs  a  longer  st-path.  Hence  find-path  eventually  halts  as  desired. 

Before  analyzing  the  efficiency  of  this  routine,  let  us  observe  that  it  is  not  difficult  to  implement 
find-path.  In  particular  in  the  In  and  Out  Steps,  paths  P-  and  P'0  are  readily  available.  To  see  why, 
consider  for  definiteness  the  Out  Step  and  path  P '0.  For  a  vertex  v  deleted  in  the  Out  Step,  define 
level(v)  as  follows.  If  v  £  V  ( P )  then  level(v)  =  0;  otherwise  level(v)  is  the  smallest  value  such 
that  for  any  edge  vw,  level(w)  <  level(v)  —  1.  Then  the  longest  path  of  edges  deleted  in  the  Out 
Step  starting  at  v  has  length  exactly  level(v).  Furthermore,  such  a  longest  path  can  start  with  any 
edge  vw  where  level(w)  =  level(v)  -  1.  Thus  P'0  can  be  found  if  for  each  vertex  v,  the  algorithm 
records  the  edge  that  caused  its  outdegree  to  drop  to  zero. 

Now  we  estimate  the  efficiency  of  find-path.  Suppose  find-path  performs  J  iterations.  For 
1  <  j  <  J,  let  Pj  be  the  candidate  path  P  in  the  jth  iteration,  and  let  pj  be  the  number  of  edges 
tentatively  deleted  in  the  jth  iteration.  (Path  P  is  constructed  immediately  before  the  jth  iteration; 
Pj  is  the  value  p  computed  in  the  jih  iteration.)  Thus  Pj  is  the  path  that  find-path  adds  to  A  and 
pj  is  the  number  of  edges  actually  deleted  from  R.  Let  Pj+\  =  Pj.  We  shall  see  that  find-path 
can  be  implemented  so  that  the  time  for  the  jth  iteration  is 

0((^/p+|P,+1|)logp).  (3.6) 


This  implies  the  following  bound. 

Lemma  3.9.  The  time  for  one  execution  of  find-path  is  0((pj/p+  |Pj|)logp). 

Proof.  From  (3.6)  it  suffices  to  analyze  the  time  for  the  first  J  —  1  iterations.  From  (3.6)  this 
time  is  0( logp)  times  Mj/p+  Hjsj  |P>+i|.  The  first  summation  is  at  most  *  constant  times 

the  second,  since  the  Double  Step  implies  that  for  j  <  J ,  |P>+i|  >  Pj/p ■  The  second  summation  is 
less  than  2|Pj|,  since  the  Double  Step  implies  that  for  j  <  J ,  |Py+i|  >  2|P>|.  I 

The  Augment  Step  works  by  repeatedly  calling  find-path  until  R  becomes  empty.  Note  that 
when  the  tentative  deletions  become  permanent  in  the  Double  Step,  R  becomes  the  new  reduction 
graph.  Hence  the  entry  condition  for  the  next  call  to  findjpath  is  satisfied.  A  crucial  part  of  the 
algorithm  that  is  still  unspecified  is  how  R  is  initialized  when  the  Augment  Step  begins  (i.e.,  before 
the  first  call  to  find-path).  Excluding  that,  it  is  dear  that  the  Augment  Step  works  correctly.  The 
total  time  used  is  the  sum  of  the  bounds  of  Lemma  3.9,  which  equals  (3.5). 
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Now  we  describe  how  R  is  initialized  when  the  Augment  Step  begins.  The  first  Augment  Step 
of  match  is  simple:  There  are  no  matched  edges,  whence  R  is  the  residual  graph  of  the  eligible 
edges.  For  an  Augment  Step  after  the  first,  the  following  routine  is  used.  In  addition  to  finding  R 
it  constructs  a  path  P  to  be  used  as  the  first  candidate  path  in  find-path.  Hence  the  routine  ends 
by  skipping  the  initialization  of  fincLpath  and  going  directly  to  the  In  Step  of  find-path, 

procedure  findJRP. 

R  Step.  Let  7  be  the  forest  of  the  preceding  Relaxed  Hungarian  Search.  Do  a  breadth-first 
search  of  the  eligible  edges,  as  follows:  Start  the  search  from  the  free  vertices  of  Vj  (not  Vo).  Stop 
the  search  upon  reaching  the  first  breadth-first  level  L  that  does  not  contain  a  vertex  of  7.  Set 
V(R)  to  the  vertices  of  7  reached  in  the  search.  Construct  E(R)  from  the  eligible  edges  that  join 
vertices  of  R. 

P  Step.  Let  v  be  a  vertex  of  7  in  the  level  preceding  L.  Let  S  be  the  alternating  path  of  7  from 
a  free  vertex  of  Vo  to  v.  Let  T  be  the  alternating  path  of  the  above  breadth-first  search,  from  v  to 
a  free  vertex  of  Vj.  Set  P  to  be  the  st-path  in  R  that  corresponds  to  5  followed  by  T.  Go  to  the 
In  Step  of  find-path.  I 

Now  we  show  that  find-RP  is  correct.  Observe  that  the  R  Step  constructs  R  correctly:  When 
the  Relaxed  Hungarian  Search  halts,  as  noted  in  Section  3.2,  7  contains  all  vertices  that  are  on  an 
augmenting  path  of  eligible  edges.  Hence  7  contains  V(R).  Thus  a  vertex  is  in  R  if  and  only  if  it 
is  joined  to  a  free  vertex  of  Vj  by  an  alternating  path  of  eligible  edges  containing  only  vertices  of 
7.  Such  a  vertex  is  reached  by  level  L  in  the  breadth-first  search.  This  implies  that  R  is  initialized 
correctly. 

In  the  P  Step  it  is  clear  that  the  constructed  path  P  exists  and  is  in  R.  Hence  find-RP  is 
correct. 

The  time  for  findJRP  is  0((m/p  +  |Pj)logp),  since  (Pj  >  [T|  =  \L\  -  1.  The  first  augmenting 
path  constructed  by  find-path  will  be  at  least  as  long  as  the  path  that  it  starts  with,  which  is  the 
P  constructed  by  find-RP.  Hence  fincLRP  runs  within  the  desired  bound  (3.5). 

It  remains  to  describe  the  parallel  implementation  of  find-path.  It  can  be  implemented  so  the 
time  for  one  execution  is  given  by  (3.6).  As  with  the  Relaxed  Hungarian  Search  we  leave  most  of 
the  implementation  details  to  the  reader.  The  algorithm  uses  techniques  similar  to  the  Augment 
Step  of  Section  2.  Tentative  deletions  are  done  using  vertex  and  scan  lists,  and  the  paths  S  and  T 
are  found  by  the  greedy  strategy.  In  the  data  structure,  each  vertex  v  of  G  stores  eight  quantities: 
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its  indegree  and  outdegree;  &  tentative  indegree  and  tentative  ontdegree;  a  pointer  to  the  undeleted 
eligible  unmatched  edge  that  is  first  in  its  adjacency  list  (if  any);  a  pointer  to  the  vertex  to  which 
it  is  matched  (if  any);  if  v  is  deleted  in  the  In  Step,  a  pointer  to  an  edge  directed  to  v  in  a  longest 
path  of  deleted  edges  to  v;  a  similar  pointer  to  an  edge  directed  from  v. 

This  completes  the  derivation  of  Theorem  1.1.  I 


4.  Extensions. 

This  section  first  presents  an  efficient  algorithm  for  shortest  paths  in  a  directed  graph  with 
arbitrary  integral  edge  lengths.  Then  it  generalizes  the  assignment  algorithm  to  the  minimum  cost 
degree-constrained  subgraph  problem. 


4.1.  Optimum  duals  and  shortest  paths. 

Some  applications  of  matching  require  the  optimum  linear  programming  duals.  We  begin  by 
showing  how  such  duals  can  be  derived  from  r-optimal  duals.  Then,  as  an  example,  we  si  *w  how 
this  gives  an  efficient  shortest  path  algorithm. 

Let  G+  be  G  with  an  additional  vertex  s  €  Vo  and  an  edge  sv  for  each  v  £  Vj.  Extend  the 
given  cost  function  c  to  G+  by  defining  c(sr)  as  an  arbitrary  integer;  the  cost  function  used  by 
the  main  routine  of  our  algorithm  extends  to  G+  by  its  definition,  c  =  (r  +  n  +  l)c.  To  specify  a 
cost  function  on  G+  we  write  G+;c  or  G+;c.  Let  M  be  a  minimum  perfect  matching  on  G\  for 
vertex  t;  let  t/  denote  its  mate,  i.e.,  vv'  €  M.  For  v  6  Vo  let  Mv  be  a  minimum  perfect  matching 
on  G+  —  v ;  c.  (Such  a  matching  exists,  e.g.,  M  —  vv'  +  av' .)  Optimum  linear  programming  duals 
are  given  by 

y(v)  =  if  v  €  Vo  then  -  e(Mv)  else  c(w')  -  y(v’). 

(This  can  be  proved  by  an  argument  similar  to  the  algorithm  given  below.  Alternatively  see  [G87] 
for  a  proof  from  first  principles.) 

Recall  the  ordinary  Hungarian  search  described  in  Section  3.2.  Suppose  such  a  search  is  done 
on  G+;c,  with  matching  M.  It  halts  with  a  tree  T  of  eligible  edges,  rooted  at  s.  The  construction 
of  G+  implies  T  is  a  spanning  tree.  For  any  v  P  V?.  augmenting  along  the  su-path  in  T  gives  an 

v 

r-optimal  matching  Nv  on  G+  -  u;c.  Nv  is  a  minimum  perfect  matching  on  G+  -  v;  c.  This  follows 
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from  Lemma  3.1,  since  G+  —  v  and  G  have  the  same  number  of  vertices.  Hence  Nv  qualifies  as  the 
above  Af„. 

This  implies  the  following  procedure  to  find  optimum  linear  programming  duals.  Given  is  the 
output  of  our  matching  algorithm,  i.e.,  an  r-optimal  matching  on  G\c  with  matching  M  and  dual 
function  y.  Form  G+\c,  defining  c(sv)  =  fy(u)/(r  +  n  +  1)]  for  each  v  €  V\.  Set  y(s)  «—  0  to 
get  r-feasible  duals.  Do  an  ordinary  Hungarian  search  to  construct  a  spanning  tree  T  of  eligible 
edges  rooted  at  s.  For  a  vertex  v  6  Vqi  let  P  denote  its  path  to  the  root  in  T.  Compute  v’s  linear 
programming  dual  as  c(M)  +  c(P  D  M )  —  c(P  —  M).  Compute  the  dual  of  a  vertex  of  V\  using  the 
above  formula. 

This  algorithm  can  be  implemented  in  time  0((m/p  +  n)logp).  The  Hungarian  search  is 
implemented  as  in  match.  Note  that  the  definition  of  c(sv )  ensures  A  <  n.  The  duals  are  found  by 
a  depth-first  traversal  of  T  using  one  processor. 

Corollary  4.1.  Optimum  linear  programming  duals  on  a  bipartite  graph  can  be  found  in  the 
bound  of  Theorem  1.1.  I 

One  application  of  these  duals  is  to  solve  a  shortest  path  problem. 

Theorem  4.1.  The  single-source  shortest  path  problem  on  a  directed  graph  n  vertices,  m  edges, 
and  arbitrary  integral  edge  lengths  can  be  solved  in  the  bound  of  Theorem  1.1. 

Proof.  This  problem  can  be  solved  by  finding  optimum  linear  programming  duals  for  a  bipartite 
graph  whose  costs  are  the  edge  lengths,  and  then  running  Dijkstra’s  algorithm  [G85].  The  latter 
can  be  implemented  in  time  0((m/p  +  n)  log  N  logp)  using  the  scaling  algorithm  of  [G85j.  I 


4.2.  Degree-constrained  subgraphs. 

This  section  presents  our  algorithm  for  finding  a  minimum  perfect  DCS.  The  algorithm  gener¬ 
alizes  the  matching  algorithm  of  Section  3.  Here  we  concentrate  only  on  the  aspects  of  the  algorithm 
that  are  new.  The  analysis  uses  the  same  sequence  of  lemmas  as  Section  3.  Most  of  the  proofs  here 
give  only  the  facts  needed  to  extend  the  argument  of  Section  3.  The  section  also  discusses  several 

other  DCS  algorithms.  Recall  that  the  function  u  specifies  the  degree  constraints;  we  denote  a  DCS 

% 

by  D,  and  the  function  d  specifies  the  degree  of  a  vertex  in  D. 
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Define  a  relaxation  function  to  be  a  function  p  :  V  — *  N  (for  N  the  nonnegative  integers) 
that  has  p(v)  =  0  for  each  r  £  Vj.  (The  intent  is  that  p(v )  equals  the  total  amount  that  y(v) 
has  decreased  in  relax  operations.)  Note  that  in  expressions  such  as  p(D)  a  vertex  v  contributes 
p{v)d(v). 

An  r-feasible  DCS  consists  of  a  degTee-constrained  subgraph  D,  a  nonnegative  integer  r,  a  dual 
function  y  and  a  relaxation  function  p  such  that 


y(e)  <  c(e)  +  1, 

e*D; 

(4-la) 

y(e)  >  c(e)  -p(e), 

e  G  D; 

(4.16) 

p(D)  <  r. 

(4.1c) 

An  r-optimal  DCS  is  a  perfect  DCS  that  is  r-feasible. 

Lemma  4.1.  If  some  integer  larger  than  r  +  n  divides  each  cost  c(e)  evenly,  then  any  r-optimal 
DCS  is  a  minimum  perfect  DCS. 

Proof.  Use  the  characterization  that  a  perfect  DCS  D  has  minimum  cost  if  and  only  if  any 
alternating  cycle  C  has  c(C  fl  D)  <  c(C  -  D).  I 

The  algorithm  is  again  stated  using  the  integer  parameters  r,  b  =  3r  +  5U  and  g.  We  eventually 
choose  r,  b  —  Q{U),  g  =  9(logt/). 

The  main  routine  and  the  scale~match  routine  work  exactly  as  in  matching.  The  desired  time 
bound  for  the  algorithm  follows  if  each  scale  runs  in  time 

0((VU m/ p  +  U  log 7U)logp).  (4.2) 

Let  D_  be  the  r-optimal  matching  of  the  previous  scale.  (For  the  first  scale,  D_  is  any  perfect 
DCS  )  The  costs  input  to  match  have  these  properties: 

(а)  Any  edge  not  in  D L  costs  at  least  —  1. 

(б)  Any  subset  of  E{D.  )  costs  at  most  2r  +  317. 

The  proof  of  (6)  uses  the  nonnegativity  of  p. 

In  the  match  routine,  edge  e  is  eligible  if  equality  holds  in  (4.1a)  (for  e  $  D)  or  (4.16)  (for  e  6  D). 
The  match  routine  differs  from  the  one  in  Section  3  in  two  respects:  The  first  is  initialization.  Each 
relaxation  amount  p(v)  is  set  to  0.  Further,  the  DCS  D  is  initialized  to  {e)c(e)  <  -!}• 


The  second  difference  is  the  Augment  Step.  Define  an  ap-set  to  be  a  set  of  edge-disjoint 
augmenting  paths  of  eligible  edges,  such  that  any  vertex  v  is  an  end  of  at  most  u(v)  -  d(v)  paths. 
(In  a  multigraph,  “edge-disjoint”  means  a  given  copy  of  an  edge  is  in  at  most  one  path.)  The 
Augment  Step  finds  a  maximal  ap-set  and  augments  the  DCS  along  each  path.  Unlike  Section  3, 
no  duals  are  changed  after  an  augment. 

The  properties  of  the  Search  and  Augment  Steps  are  similar  to  Section  3,  with  these  changes. 
The  definition  of  /  is  changed  to  the  deficiency  of  the  DCS,  i.e., 

/  =u(V0)  ~  d(V0). 

In  addition  to  relation  (3.3),  any  free  vertex  v  €  Vo  has  p(v)  =  0  (it  is  never  relaxed). 

The  correctness  of  match  follows  as  in  Section  3,  using  these  observations  to  show  r- feasibility: 
The  initialization  of  D  guarantees  (4.1a);  also  the  initial  D  is  included  in  D_  by  property  (a),  so 
it  satisfies  the  degree  constraints.  Finally,  the  Augment  Step  does  not  increase  p{D),  since  a  free 
vertex  v  has  p(v)  =  0. 

The  analysis  of  the  efficiency  of  match  follows  Section  3: 

Lemma  4.2.  At  any  time  during  match,  f  A  <  b. 

Proof.  Consider  the  expression  Y  =  y(D_  -  D)  -  y(D  -  D_).  I 
Lemma  4.3.  I<2Vb+l. 

Proof.  In  the  Augment  Step  the  edges  on  an  augmenting  path  become  ineligible,  by  the  definition 
of  “eligible”  and  the  nonnegativity  of  p.  This  implies  that  any  Hungarian  search  does  a  dual 
adjustment.  I 

Lemma  4.4.  A,  <  b  +  61ogU.  I 

Lemma  4.5.  A  <  U  +  b  +  61ogf7. 

Proof.  By  the  argument  of  Lemma  3.5  and  the  nonnegativity  of  p,  A;  >  A  +  c(Du)-c(Dq). 
The  initialization  of  match  shows  that  am  edge  in  Du  -  Z?o  costs  at  least  -1.  So  Lemma  4.4  implies 
the  desired  bound.  I 

Now  we  turn  to  the  Relaxed  Hungarian  Search.  It  differs  from  Section  3  as  a  consequence  of 

* 

the  fact  that  an  edge  of  D  need  not  be  eligible.  Hence  the  search  forest  T  is  grown  edge-by-edge. 
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rather  than  in  pairs  of  unmatched  and  matched  edges.  Thus  the  dual  adjustment  quantity  is  defined 


as 

6  =  min{c(e)  +  1  -  y(e)j  e  g  D,  e0  6  7,  g  7} 

u  (y(«)  -  c(«)  +  P(e)  1  e  £  D,  eo  i  7,  ex  £  -F}. 


(4.3) 


To  relax  a  set  of  nonfree  vertices  5  C  Vo  means  to  decrease  y(v)  by  1  and  increase  p(v)  by 
1,  for  each  v  £  S.  This  operation  makes  any  non-D-edge  that  is  incident  to  S  ineligible.  It 
does  not  change  the  eligibility  of  any  .D-edge.  Concerning  r-feasibility,  a  relax  operation  preserves 
(4.1a}-(4.1A).  Concerning  (4.1c),  it  increases  p(D)  by  ti (S). 

The  Relaxed  Hungarian  Search  works  as  follows.  It  initializes  the  search  forest  7  to  contain 
the  free  vertices  of  Vo-  Then  it  repeats  the  following  steps  until  the  Adjust  Step  halts  with  7  as 
desired. 


Adjust  Step.  Set  Wj  <—  {ejl  some  eligible  edge  e  g  D  has  e0  6  7,  ex  £  JF), 

Wo  (eol  some  eligible  edge  e  £  D  has  e\  €  W\,  eo  7},  kV2  (eo|  some  eligible  edge  e  € 

D  has  ej  £7,  eo  $  7}.  If  Wi  U  W2  =  0  and  7  contains  a  free  vertex  of  V) ,  halt.  If  Wi  U  W>  =  0 
and  7  does  not  contain  a  free  vertex  of  Vi,  do  a  dual  adjustment  and  repeat  this  step. 

Grow  Step.  For  each  vertex  w  £  Wx  U  W2,  add  an  appropriate  eligible  edge  vw  (v  £  7)  to  7\ 

then  do  the  same  for  each  w  £  W0.  If  u(Wq  U  W2)  <  f/g  then  relax  W0  U  W2.  I 


This  routine  works  analogously  to  the  one  in  Section  3.  Note  that  after  the  Grow  Step  adds 
edges,  an  eligible  edge  with  exactly  one  vertex  in  7  is  either  a  non-D-edge  or  is  incident  to  WoU  Wo. 
If  a  relax  operation  is  done,  it  makes  the  non-Z?-edges  incident  to  Wo  U  W2  ineligible.  Hence  the 
next  Adjust  Step  halts  or  does  a  dual  adjustment. 

As  in  Section  3  we  will  choose  parameter  r  as  an  upper  bound  to  p{D).  The  correctness  of  the 
Relaxed  Hungarian  Search  is  proved  as  in  Section  3. 

Now  we  establish  the  two  properties  needed  by  the  Augment  Step.  The  first  is  that  7  contains 
all  vertices  that  are  on  an  augmenting  path  of  eligible  edges.  When  the  search  halts  an  eligible 
edge  e  with  exactly  one  vertex  in  7  is  either  a  Z?-edge  with  eo  €  7  or  a  non-2?-edge  with  ex  £  7. 
Since  an  augmenting  path  starts  at  a  vertex  of  Vo(^)  and  is  alternating,  it  cannot  leave  7  on  such 
an  edge. 

The  second  property  is  acyclicity: 


Lemma  4.6.  la  match  there  is  never  an  alternating  cycle  of  eligible  edges. 


Proof.  Consider  an  alternating  cycle  of  eligible  edges  C.  C  does  not  exist  after  the  initialization 
of  match,  since  every  D-edge  is  ineligible.  C  is  not  created  by  an  Augment  Step  or  relax  operation, 
since  neither  creates  an  eligible  edge.  It  remains  only  to  show  that  C  is  not  created  by  a  dual 
adjustment. 

A  dual  adjustment  can  create  an  eligible  edge  c  that  has  exactly  one  of  its  vertices  in  7 . 
Suppose  C  contains  such  an  e.  Let  /  be  the  first  edge  after  e  in  C  with  exactly  one  vertex  in  7 . 
The  two  possibilities  for  e  are  e  £  D  with  eo  €  7,  or  e  G  D  with  ei  €  7.  In  either  case  since  C  is 
alternating,  the  two  possibilities  for  /  are  /  ^  D  with  f\  €  7,  or  /  6  D  with  /o  €  7.  But  such  an 
edge  is  ineligible  after  the  dual  adjustment.  I 

Now  we  analyze  the  efficiency  of  the  Relaxed  Hungarian  Search. 

Lemma  4.7.  H  =  0(b  +  gU  log  U). 

Proof.  The  only  change  in  the  argument  is  the  definition  of  possibility  (i)  .  It  becomes:  (i)  The 
iteration  increases  u(V0 (7))  by  at  least  f/g.  This  occurs  at  most  gU / f  times  in  a  given  search.  I 


Lemma  4.8.  At  any  time,  p(D)  <  (4b\ogU)/g. 

Proof.  A  relax  operation  increases  p(D )  by  at  most  f/g.  I 

The  rest  of  the  development  —  choosing  parameters,  implementing  of  the  Relaxed  Hungarian 
Search,  the  Augment  Step  and  its  analysis  —  is  entirely  analogous  to  that  in  Section  3. 

The  following  result  is  derived  by  proceeding  as  in  Section  3. 

Theorem  4.2.  A  minimum  perfect  degree-constrained  subgraph  on  a  bipartite  multigraph  can 
be  found  in  time  0(\/(7m  log  (nN)(  log  p)/p)  and  space  0(m),  for  p  <  log  2n).  I 

Now  consider  the  general  minimum  cost  degree-constrained  subgraph  problem  on  a  bipartite 

multigraph  G.  Each  vertex  «  has  given  degree  bounds  l(v)  and  u(v).  Also  given  are  bounds  on 

the  cardinality  of  the  solution,  L  and  H.  We  seek  a  degree-constrained  subgraph  (i.e.,  for  each 

vertex  v ,  t(v)  <  d(v)  <  u(v))  that  has  minimum  cost  subject  to  the  restriction  that  it  contains 

% 

between  L  and  H  edges  (inclusive).  Note  that  special  cases  of  this  problem  include  minimum  cost 
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matching  (/  =  0,u  =  1,L  =  0,  E  =  n),  minimum  cost  cardinality-*  matching  (change  L  and  H  to 
k ),  minimum  cost  maximum  cardinality  matching,  and  similar  DCS  problems. 

It  is  straightforward  to  reduce  this  general  problem  on  a  multigraph  G  to  a  minimum  perfect 
DCS  problem  on  a  multigraph  Gm.  For  t  =  0,1,  define  i'  =  1  —  i  and  add  a  vertex  a*  to  VJ.  For 
each  v  €  -  a,-,  add  edge  s,t>  with  multiplicity  u(v)  —  £(v)  and  cost  zero.  Also  add  edge  so3i  with 

multiplicity  H  —  L  and  cost  zero.  Set  u(sj)  =  tt(VV  —  s,<)  —  L.  It  is  easy  to  see  ,hat  a  minimum 
perfect  DCS  on  G *  is  a  solution  to  the  general  problem.  This  gives  the  following  result.  Define  U 
as  the  sum  of  the  upper  bounds  of  the  given  multigraph  G. 

Corollary  4.2.  The  general  minimum  cost  degree-constrained  subgraph  problem  on  a  bipartite 
multigraph  can  be  solved  in  time  0(\/rUmlog(nN)(\ogp)/p)  and  space  0(m),  for 
P  <  m/(y/U  log  2n).  I 
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